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Abstract — Compressed Sensing aims to capture attributes of 
fc-sparse signals using very few measurements. In tlie standard 
Compressed Sensing paradigm, tlie N x C measurement matrix 
<1> is required to act as a near isometry on the set of all 
fc-sparse signals (Restricted Isometry Property or RIP). If $ 
satisfies the RIP, then Basis Pursuit or Matching Pursuit recovery 
algorithms can be used to recover any fc-sparse vector a from 
the measurements $a. Although it is known that certain 
probabilistic processes generate N x C matrices that satisfy 
RIP with high probability, there is no practical algorithm for 
verifying whether a given sensing matrix <I> has this property, 
crucial for the feasibility of the standard recovery algorithms. In 
contrast this paper provides simple criteria that guarantee that 
a deterministic sensing matrix satisfying these criteria acts as a 
near isometry on an overwhelming majority of fc-sparse signals; 
in particular, most such signals have a unique representation in 
the measurement domain. Probability still plays a critical role, 
but it enters the signal model rather than the construction of 
the sensing matrix. An essential element in our construction is 
that we require the columns of the sensing matrix to form a 
group under pointwise multiplication. The construction allows 
recovery methods for which the expected performance is sub- 
linear in C, and only quadratic in A^, as compared to the super- 
linear complexity in C of the Basis Pursuit or Matching Pursuit 
algorithms; the focus on expected performance is more typical 
of mainstream signal processing than the worst-case analysis 
that prevails in standard Compressed Sensing. Our framework 
encompasses many families of deterministic sensing matrices, 
including those formed from discrete chirps, Delsarte-Goethals 
codes, and extended BCH codes. 

Index Terms — Deterministic Compressed Sensing, Statistical 
Near Isometry, Finite Groups, Martingale Sequences, McDiarmid 
Inequality, Delsarte-Goethals Codes. 



I. Introduction and Notations 

The central goal of compressed sensing is to capture at- 
tributes of a signal using very few measurements. In most 
work to date, this broader objective is exemplified by the 
important special case in which a fc-sparse vector a MP 
(with C large) is to be reconstructed from a small number N 
of linear measurements with fc < TV < C. In this problem, the 
measurement data constitute a vector / = N~-^/^ $«, where 
$ is an X C matrix called the sensing matrix. Throughout 
this paper we shall use the notation cpj for the j-th column 
of the sensing matrix $; its entries will be denoted by fjix) 
(with label x varying from 1 to N). In other words, (pj{x) is 
the x-th row and j-th column element of $. 

The work of R. Calderbank and S. Jafarpour is supported in part by NSF 
under grant DMS 0701226. by ONR under grant N00I73-06-1-G006, and by 
AFOSR under grant FA9550-05-1-0443 



The two fundamental questions in compressed sensing are: 
how to construct suitable sensing matrices $, and how to 
recover a from / efficiently; it is also of practical importance 
to be resilient to measurement noise and to be able to recon- 
struct (approximations to) fc-compressible signals, i.e. signals 
that have more than fc nonvanishing entries, but where only fc 
entries are significant and the remaining entries are close to 
zero. 

The work of Donoho |9] and of Candes, Romberg and Tao 
ifTOl . 121, ifTTI provides fundamental insight into the geometry 
of sensing matrices. This geometry is expressed by e.g. the 
Restricted Isometry Property (RIP), formulated by Candes and 
Tao |[T0| : a sensing matrix satisfies the fc-Restricted Isometry 
Property if it acts as a near isometry on all fc-sparse vectors; to 
ensure unique and stable reconstruction of fc-sparse vectors, it 
is sufficient that $ satisfy 2fc-RIP. When N/C and/or k/N are 
(very) small, deterministic RIP matrices have been constructed 
using methods from approximation theory lfT2l and coding 
theory lfT3l . More attention has been paid to probabilistic con- 
structions where the entries of the sensing matrix are generated 
by an i.i.d Gaussian or BernoulU process or from random 
Fourier ensembles, in which larger values of N/C and/or k/N 
can be considered. These sensing matrices are known to satisfy 
the fc-RIP with high probability ||9l, |[TOl and the number N 
of measurements is fc log ^ . This is best possible in the sense 



that approximation results of Kashin il41 and Glushin 111511 
imply that il{ fc log ^ ) measurements are required for sparse 
reconstruction using £i -minimization methods. Constructions 
of random sensing matrices of similar size that have the RIP 
but require a smaller degree of randomness, are given by 
several approaches including filtering lfT6l . iflTl and expander 
graphs Ha, Q, 0, 0. 

The role of random measurement in compressive sensing 
can be viewed as analogous to the role of random coding 
in Shannon theory. Both provide worst case performance 
guarantees in the context of an adversarial signal/error model. 
Random sensing matrices are easy to construct, and are 2fc-RIP 
with high probability. As in coding theory, this randomness has 
its drawbacks, briefly described as follows: 

• First, efficiency in sampling comes at the cost of complexity 
in reconstruction (see Table 1) and at the cost of error in signal 
approximation (see Section 5). 

• Second, storing the entries of a random sensing matrix may 
require significant space, in contrast to deterministic matrices 
where the entries can often be computed on the fly without 
requiring any storage. 
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TABLE I 

Properties of ^-sparse reconstruction algorithms that employ random sensing matrices with N Rows and C Columns . The 

property rip- 1 is the counterpart of rip for the £i metric and it provides guarantees on the performance of sparse 
reconstruction algorithms that employ linear programming (t) . note that explicit construction of the expander graphs 
requires a large number of measurements, and that more practical alternatives are random sparse matrices which are 

expanders with high probability. 
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o Is) provides an algorithm with smaller constants that is easier to implement and analyze, whereas IS) is able to handle more 
general noise models. 

• Third, there is no algorithm for efficiently verifying whether 
a sampled sensing matrix satisfies RIP, a condition that is 
essential for the recovery guarantees of the Basis Pursuit and 
Matching Pursuit algorithms on any sparse signal. 
These drawbacks lead us to consider constructions with de- 
terministic sensing matrices, for which the performance is 
guaranteed in expectation only, for fc-sparse signals that are 
random variables, but which do not suffer from the same 
drawbacks. The framework presented here provides 

• easily checkable conditions on special types of deterministic 
sensing matrices guaranteeing successful recovery of all but 
an exponentially small fraction of fc-sparse signals; 

• in many examples, the entries of these matrices can be 
computed on the fly without requiring any storage, and 

• recovery algorithms with lower complexities than Basis 
Pursuit and Matching Pursuit algorithms. 
To make this last point more precise, we note that Basis Pursuit 
and Matching Pursuit algorithms rely heavily on matrix-vector 
multiplication, and are super- linear with respect to C, the 
dimension of the data domain. The reconstruction algorithm 
for the framework presented here (see Section 5) requires only 
vector-vector multiplication in the measurement domain; as a 
result, its recovery time is only quadratic in the dimension 

of the measurement domain. We suggest that the role of 
the deterministic measurement matrices presented here for 
compressive sensing is analogous to the role of structured 
codes in communications practice: in both cases fast encoding 
and decoding algorithms are emphasized, and typical rather 
than worst case performance is optimized. We are not the 
only ones seeking inspiration in coding theory to construct 
deterministic matrics for compressed sensing; Table 2 gives 



an overview of approaches in the literature that employ de- 
terministic sensing matrices, several of which are based on 
linear codes (cf. (T91 and |pT|) and provide expected-case 
rather than worst-case performance guarantees. It is important 
to note (see Table 2) that although the use of linear codes 
makes fast algorithms possible for sparse reconstruction, these 
are not always resilient to noise. Such non-resilience manifests 
itself in e.g. Reed-Solomon (RS) constructions fT\\ \ the RS 
reconstruction algorithm (the roots of which go back to 1795! 
- see ||26l . ||27l ) uses the input data to construct an error- 
locator polynomial; the roots of this polynomial identify the 
signals appearing in the sparse superposition. Because the 
correspondence between the coefficients of a polynomial and 
its roots is not well conditioned, it is very difficult to deal 
with compressible signals and noisy measurements in RS- 
based approaches. 

Because we will be interested in expected-case performance 
only, we need not impose RIP; we shall instead work with 
the weaker Statistical Restricted Isometry Property. More 
precisely, we define 

Definition 1. ((fc, e, ^)-StRIP matrix) 

An N X C (sensing) matrix <& is said to be a {k, e, S)- 
Statistical Restricted Isometry Property matrix [abbreviated 
(fc, e, (5)-StRIP matrix] if, for fc-sparse vectors a G M'', the 
inequalities 



N 



<(i + ^)ii«ir, (1) 

hold with probability exceeding 1 — 5 (with respect to a 
uniform distribution of the vectors a among all fc-sparse 
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TABLE II 

Properties of /c-sparse reconstruction algorithms that employ deterministic sensing matrices with N Rows and C Columns. Note 
that for ldpc codes k <^c. note also that rip holds for random matrices where it implies existence of a low-distortion 
embedding from £2 into £1 . guruswami ft al. itsll proved that this property also holds for deterministic sensing matrices 

CONSTRUCTED FROM EXPANDER CODES. IT FOLLOWS FROM THE0REm[8]iN THIS PAPER THAT SENSING MATRICES BASED ON DISCRETE CHIRPS AND 

Dels ARTE-GOETHALS CODES SATISFY THE USTRIP. 
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vectors in Mp of the same norm)|3. 

There is a slight wrinkle in that, unlike the simple RIP 
case, StRIP does not automatically imply unique reconstruc- 
tion, not even with high probability. If an N x C matrix 
$ is (2A;, e, (5)— StRIP, then, given a fc-sparse vector a, it 
does follow that $ maps any other randomly picked fc- 
sparse signal /3 to a different image, i.e. $ a 7^ $ /3, with 
probability exceeding 1 — S (with respect to the random 
choice of /3). This does not mean, however, that uniqueness 
is guaranteed with high probability: requiring that the mea- 
sure of {a G MP ; a is fc-sparse and there is a different fc — 
sparse j3 € for which $ a = $ /3 } be small, is a more 
stringent requirement than that the measure of {P gMP ; f3 7^ 
a and $ a = $ /? } be small for all fc-sparse a. For this 
reason, we also introduce the following definition: 

Definition 2. ((fe, e, ^)-UStRIP matrix) 

An N X C (sensing) matrix $ is said to be a (fc, e, 5)- 
Uniqueness-guaranteed Statistical Restricted Isometry Prop- 
erty matrix [abbreviated (fc, e, (5)— UStRIP matrix] if $ is a 
(fc, e, (5)-StRIP matrix, and 

{/3 G R'^ ; $a = = {a} 

with probability exceeding 1 — S (with respect to a uniform 
distribution of the vectors a among all fc-sparse vectors in M'' 
of the same norm). 

Again, we are not the first to propose a weaker version 
of RIP that permits the construction of deterministic sensing 
matrices. The construction by Guruswami et al. in ifTSl can 

'Throughout the paper norms without subscript denote £2 -norms 



be viewed as another instance of a weakening of RIP, in 
the following different direction. RIP implies that $ defines 
a low-distortion ^2-^1 -embedding that plays a crucial role 
in the proofs of flOl, O]. In HS), Guruswami 
et al. prove that this ^2-^1 -embedding property also holds 
for deterministic sensing matrices constructed from expander 
codes. These matrices satisfy an "almost Euclidean null space 
property" property, that is for any a in the null space of $, 
^^^1"^^^ is bounded by a constant ; this is their main tool to 
obtam the results reported in Table 2. 

In this paper we formulate simple design rules, imposing 
that the columns of the sensing matrix form a group under 
pointwise multiplication, that all row sums vanish, that differ- 
ent rows are orthogonal, and requiring a simple upper bound 
on the absolute value of any column sum (other than the 
multiplicative identity). The properties we require are satified 
by a large class of matrices constructed by exponentiating 
codewords from a linear code; several examples are given in 
Section 2. In Sections 3, we show that our relatively weak 
design rules are suficient to guarantee that $ is UStRIP, 
provided the parameters satisfy certain constraints. The group 
property makes it possible to avoid intricate combinatorial 
reasoning about coherence of collections of mutually unbiased 
bases (cf. 1281 ). Section 4 applies our results to the case 
where the sensing matrix is formed by taking random rows 
of the EFT matrix. In Section 5 we emphasize a particular 
family of constructions involving subcodes of the second 
order Reed-Muller code; in this case codewords correspond to 
multivariable quadratic functions defined over the binary field 
or the integers modulo 4. Section [Vl] provides a discussion 
regarding the noise resilience. 
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II. StRIP-able: Basic Definitions, with Several 
Examples 

In this section we formulate three basic conditions and 
give examples of deterministic sensing matrices $ with 
rows and C columns that satisfy these conditions. Note that 
throughout the paper, we shall assume (without stating this 
again explicitly) that $ has no repeated columns. 

Definition 3. An x C— matrix $ is said to be StRIP- 
able, where rj satisfies < < 1, if the following three 
conditions are satisfied: 

• (Stl) The rows of $ are orthogonal, and all the row sums 
are zero. i.e. 

c 



^ Lpj{x) tpj{y) = if X 7^ y (2) 



c 

E 



, for all X 



(3) 



(Stl) The columns of $ form a group under "pointwise 
multiplication", defined as follows 

for all J, /e {!,..., C}, 
there exists a j" £ {1, ■■.,€} such that 
for all X : ipj{x) Lpji{x) = ^pj"{x) . (4) 

In particular, there is one column of $ for which all 
the entries are 1, and that acts as a unit for this group 
operation; this column will be denoted by 1. Without 
loss of generality, we will assume the columns of $ are 
ordered so that ipi = 1, i.e. ipi{x) = 1 for all x. 
(St3) For all j e {2,. ..,C}, 

2 



(5) 



Remarks 

1 . Condition (|5]l applies to all columns except the first column 
(i.e. the column which consists of all ones). 

2. The justification of the name StRIP-able will be given in 
the next section. 

3. When the value of jy in (|5]) does not play a special role, we 
just don't spell it out explicitly, and simply call <1> StRIP-able. 

The conditions (|2]|5]l have the following immediate conse- 
quences: 



Lemma 4. If the matrix $ satisfies then \ipj{x) 
all j and all x. 



I, for 



Proof: For every x, j^^i is a group of 

complex numbers under multiplication; all finite groups of this 
type consist of unimodular numbers. ■ 

Lemma 5. If the matrix $ satisfies ^ , then the collection 
of columns of $ is closed under complex conjugation, i.e. 
for all j G {!,..., C}, there exists a j' g {1, . . . , C} 



such that, for all X, fj'ix) = fjix) . 



(6) 



Proof: Pick j G {1, . . . ,C}. Since the columns of $ form 
a group under pointwise multiplication, there is some j' E 



{!,...,€} such that ipji is the inverse of ipj for this group 
operation. Using Lemma H] we have then, for all x, (pj'{x) = 

[(Pjix)r'^ = 'Pj{x)- ■ 

Lemma 6. If the matrix $ satisfies (|2|l , (O and (|4|i , then the 
normalized columns (^N"^/"^ ^i) j^^i form a tight frame 
in , with redundancy C/N. 

Proof: By Lemma |4] and we have 

c 

i.e. = CIat, so that, for any vector v G C^, 

c 

^ \{v, (^,)|' = w$$t„t = c\\v\\^. 



Lemma 7. If the matrix satisfies Q , then the inner 
product of two columns (pj and ipji, defined as ipj ■ (fji := 
J2x fj{x) fj'ix) , equals N if and only if j = j'. 

Proof: 

If j — j', we obviously have (pj ■ ipji ~ N, by Lemma |4] 
If (fij ■ (fj' — N, then we have, by Cauchy-Schwarz, 



N 



ipj ■ (fij, < \(pj ■ ipj, I < \\ipj\\ \\ipj,\\ 



N 



implying that in this instance the Cauchy-Schwarz inequality 
must be an equality, so that (pjr must be some multiple of (pj. 
Since N = ipj ■ (pji, the multiplication factor must equal 1, 
so that ipj = ipj' . Since $ has no repeated columns, j = j' 
follows. ■ 
We shall prove that StRIP-able matrices have (as their 
name already announces) a Restricted Isometry Property in 
a Statistical sense, provided the different parameters satisfy 
certain constraints, which will be made clear and explicit in the 
next section. Before we embark on that mathematical analysis, 
we show that there are many examples of StRIP-able matrices. 



A. Discrete Chirp Sensing Matrices 

Let p be a prime and let w be a primitive (complex) 
root of unity. A length p chirp signal takes the form 



,th 



mx-\-rx 



where x = 0, 1, 



Here m is the base frequency and r is the chirp rate. 
Consider now the family of chirp signals (fmp+r) where 
r, TO = 0, 1, . . . ,p — 1; the "extra" phase factor (usually not 
present in chirps) ensures that the row sums ^lix) 
vanish for all x. It is easy to check that this family satis- 
fies (Stl), (St2), and (St3) ll23l . For the corresponding sensing 
matrix $, Applebaum et al. ||231 have analyzed an algorithm 
for sparse reconstruction that exploits the efficiency of the 
FFT in each of two steps: the first to recover the chirp rate 
and the second to recover the base frequency. The Gerschgorin 
Circle Theorem ||29l is used to prove that the RIP holds for 
sets of '•^^"'"^ columns. Numerical experiments reported in 
[231 compare the eigenvalues of deterministic chirp sensing 
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matrices with those of random Gaussian sensing matrices. The 
singular values of restrictions to fc-dimensional subspaces of 
N X C random Gaussian sensing matrices have a gaussian 
distribution, with mean fJ,N.c,k and standard deviation (iMfi.k'^ 
the experiments show that, for the same values of N, C and 
k, the singular values of restrictions of deterministic chirp 
sensing matrices have a similar spread around a central value 
yU G (Mw.c.fei 1) that is closer to 1; in fact, the experiments 
suggest that - ^iN,c,k > (JN,c,k- 

B. Kerdock, Delsarte-Goethals and Second Order Reed Muller 
Sensing Matrices 

In our construction of deterministic sensing matrices based 
on Kerdock, Delsarte-Goethals and second order Reed Muller 
codes, we start by picking an odd number m. The 2"' rows of 
the sensing matrix $ are indexed by the binary m-tuples x, and 
the 2 ('^+2)"' columns are indexed by the pairs P, b, where P 
is an m X m binary symmetric matrix in the Delsarte-Goethals 
set 00(771, r), and 5 is a binary m-tuple. The entry ipp^i,{x) 
is given by 



(7) 



where dp denotes the main diagonal of P, and wt denotes 
the Hamming weight (the number of Is in the binary vector). 
Note that all arithmetic in the expressions xPx^ + 2hx^ and 
wt{dp) + 2wt{b) takes place in the ring of integers modulo 4, 
since they appear only as exponents for i. Given P, b the vector 
xPx^ + 2bx^ is a codeword in the Delsarte-Goethals code 
(defined over the ring of integers modulo 4) For a fixed matrix 
P, the 2™ columns tpp.b , b G form an orthonormal basis 
Fp that can also be obtained by postmultiplying the Walsh- 
Hadamard basis by the unitary transformation diag i^^^^ . 

The Delsarte-Goethals set DG{m, r) is a binary vector 
space containing 2''"+^)™ binary symmetric matrices with the 
property that the difference of any two distinct matrices has 
rank at least m — 2r (See ll30l ). The Delsarte-Goethals sets 
are nested 

DG{m, 0) C DG{m, 1) C • • • C DG{m, (™-i)/2). 

The first set DG{m, 0) is the classical Kerdock set, and 
the last set P)G'(m, (™-i)/2) is the set of all binary sym- 
metric matrices. The r*'* Delsarte-Goethals sensing matrix 
is determined by DG{m, r) and has N = 2™ rows and 
C = 2(''+2)™ columns. The initial phase in ^ is chosen so 
that the Delsarte-Goethals sensing matrices satisfy (Stl) and 
(St2). (See Appendix A). 

Coherence between orthonormal bases F p and Fg indexed 
by binary symmetric matrices P and Q is determined by the 
rank R of the binary matrix P ® Q (See Appendix A). Any 
vector in one of the orthonormal bases has inner product of 
absolute value 2^"/" with 2^ vectors in the other basis and is 
orthogonal to the remaining basis vectors. The column sums 
in this r*'* Delsarte-Goethals sensing matrix satisfy 

2 



or Ar2-V" 



so that condition (St3) is trivially satisfied. Details are provided 
in Appendix A; we refer the interested reader to 1311 . 1321 . ll30l 
and Chapter 15 of I33J for more information about subcodes 
of the second order Reed-MuUer code. 

C. BCH Sensing Matrices 

The CarUtz- Uchiyama Bounds (See Chapter 9 of ll33l ) 
imply that the interval 

m-i _ _ -^^2'"/^ 2"-i + (t - 1)2"/' 

contains all non-zero weights in the dual of the extended 
binary BCH code BGH{m, t) of length N = 2™ and designed 
distance e ~ 2t + l, with the exception of wt{l) = N. Setting 
BGH{m,t)^ = (1) ® C™,t, the columns of the t*'' BCH 
sensing matrix are obtained by exponentiating the codewords 
in Cm,t- The column determined by the codeword c = (cj) is 
given by 



Vc{])^{-ir i-^r , where J = 0,1,- 



1, 



and where b is any vector not orthogonal to Cm,t- Conditions 
(Stl) and (St2) hold by construction and 

2 

|2 



(-1) 



2'"-l 



(-1)^ 



< 



N ~2wtH{c)\ 
2{t - 1)2"/ 



so that (St3) holds. These sensing matrices have been analyzed 
by Ailon and Liberty ll34l . 

In the binary case, the column sums take the form N — 2w 
where w is the Hamming weight of the exponentiated code- 
word, and a similar interpretation is possible for codes that are 
linear over the ring of integers modulo 4 (see ll30l ). Property 
(St3) connects the Hamming geometry of the code domain, 
as captured by the weight enumerator of the code, with the 
geometry of the complex domain. 

III. Implications for Deterministic StRIP-able 
Sensing Matrices: Main Result 

In this section we prove our main result, namely that if $ 
satisfies (Stl), (St2) and (St3), then $ is UStRIP, under certain 
fairly weak conditions on the parameters. More precisely. 

Theorem 8. Suppose the N x C matrix $ is i]-StRIP-able, 
and suppose fc<l + (C — l)e and rj > 1/2. Then there 



exists a constant c such that, if N > (^c — 
[k, e, SyuStRIP with 5 := 2 cxp ' 



then $ is 



[c-(k-l)/{C-l)Y 
8 fc 



The proof of Theorem [8] has two parts: we shall first, in 
Section 3.1, prove that $ is StRJP; when this is established 
we turn our attention to proving UStRIP in Section 3.2. 

A. Proving StRIP 

3.1.1 Setting up the Framework 

It will be convenient to decompose the random process 
generating the vectors a as follows: first pick (randomly) the 
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indices of the nonzero entries of a, and then the values of those 
entries. For the first step, we pick a random permutation tt = 
1, ... ,c } of { 1, . . . , C }; the k numbers tti, . . . , 7rfc 
will then be the indices of the non-vanishing entries of a. 
Next, we pick k random values ai, . . . , a^; these will be the 
non-zero values of the entries of the vector a. Computing 
expectations with respect to a can be decomposed likewise; 
when we average over all possible choices of tt, but not yet 
over the values of the random variables ai, . . . , a^, we shall 
denote such expectations by E^r, adding a subscript. We start 
by proving the following 

Lemma 9. For tt, $, a as described above and f := 

N^'^/'^^a, we have 

< E.[!|/fl 



C - 1 



< 1 



C - 1 



Proof: With the notations introduced above, the entries 
of / := N^-^/^ $ a are given by 
f{x) N^^/'^ (^j^-nj{x). We have then 



N 



/II 



equality, and of ([3]) in the second. It then follows that 

^ j.jwithj^i 

N ^ 



i.j; with j^i 

Applying the Cauchy-Schwarz inequahty, we obtain 



< 



E 



EK- 



i.j with j^i 
2 



E 



< 



^E 



Combining this with the previous equality gives 

N{k-l) „ 2 
C-1 " " 



E E '^'^j- 

a; i.j with j^i 



< 



N 
C-1 



It then suffices to substitute this into (|8]l to prove the Lemma. 



N k 
x=l \j=l 



a,- 



^(x) (8) 



where ^!{x) = Lj.j. V^^^j i^) f^i i^) ■ 

The first term in (|8]l is independent of tt ; it just equals 

E-=i \a,\'^\\a\\\ 

For the second term, we have 



E E ^'^j 

a; i,jwith j^i 



(9) 



E Qij E^Ti- 

i,j with j^i 



E '^'^^ 



By (|4]l and Lemma |6] we have (pi{x) ipi'{x) = 
X^x Vni(^) some appropriate m := m{£,£'); if £ ^ £', 
then = [(pi/)"^ ^ ((^s^)"^, so that m{£,£') ^ 1. 
As TT ranges over all possible permutations of {1, . . . ,C}, the 
index m{'Ki,'Kj) (with j ^ i) will range (uniformly) over all 
possible values 2, . . . ,C (i.e. excluding 1). It follows that, for 
i 7^ i. 



E Vttj {x) (/3x. [x) 



(c-i)"'EE^^(^) 
(c-i)-^E(-i) 



N 
C - 1 



(10) 



where we have made use of a counting argument in the first 



Remark 10. By using the Cauchy-Schwarz inequality in the 
last step of the proof of|9]we may have sacrificed quite a bit, 
especially if the non vanishing entries in a differ appreciably 
in order of magnitude. Without this step, the final inequality 
would be 

-^(ll«||,^, -llalP) 



< E, 



E E '^'^j- 

^ i,j with j^i 



< 



N 
C-1 



a 



(11) 



To prove the concentration of j|/jp around its expected 
value, we will make use of a version of the McDiarmid 
inequality |f35l based on concentration of martingale 
difference random variables with distinct values (as opposed 
to independent values for the standard McDiarmid inequality). 
In what follows, upper case letters denote random variables, 
lower case letters denote values taken on by these random 
variables. 



Theorem 11 (Self- Avoiding McDiarmid inequahty). Let 

Xi, - ■ ■ , Xm be probability spaces and define X as the prob- 
ability space of all distinct m-tuple^ In other words, the set 
X is the subset of the product set % ^ Xi x ■ ■ ■ , x given 



-We follow a widespread custom, and denote by the same letter both the 
set carrying the probability measure, and the probability space [i.e. the triplet 
(set,(T-algebra of measurable sets, measure)]. We shall specify which is meant 
when confusion could be possible. 
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by 
X 



From (fTSl l and Lemma 2.2 we get 



{(ti, • • • , e n^iA-, V i ^ j : ^ i,}; (12) 



f/ze probability measure on X is just the renormalization (so 
as to be a probability measure) of the restriction to X of the 
standard product measure on X. 

Let h{ti, • • • , t,fn) be a function from the set X to M, such that 
for any coordinate i, given ti, - ■ ■ , U^i: 



We have then 



, TTfej 



E 



1 

N 



ij with i=^j 



sup 

u^Xi-.u^tn .n—l — 1 

inf 



T 



/l(7ri, ... ,TTt 
1 



TTfc) 



< Q(,13) 



1 

TV 



where the expectations are taken over the random variables 
Ti+i, .., T„i (conditioned on taking values that are all different 
from each other and from as well as u (first 

expectation) or I (second expectation). Then for any positive 
7. 



Pr[|/i(ri,-- 

< 2 exp 



T 

1 -' rr 



>7] 



(14) 



Proof: See Appendix B. ■ 

3.1.2 Proof of StRIP 

We are now ready to start the 

Proof: {of the (fc, e, S)-StRIP property, claimed in Theo- 
rem [8]l 

Let Vk denote the set of all fc-tuples (tti ,• • • ,7Tk) where 
(tti , • • • ,7rc) is a permutation of {1,2, ••• ,C}. It follows 
from the definition that all entries of each element of Vk 
are distinct. The set Vk is finite; equipped with the counting 
measure, renormalized so as to have total mass 1, Vk is the 
probability space of the k non-zero entries of the random 
signal a: the (tti, • • • , tt^), corresponding to (uniformly) ran- 
domly picked permutations tt of { 1, . . . , C}, are random 
variables distributed uniformly in Vk- For 1 < i < j < k, 
we denote by TTi^j the {j — i + 1) -tuple of random variables 

(TTi, TTi+l, • • • ,7Tj). 

Given values ai, a2, ■ ■ ., ak, let f : Vk ^ be defined 
by /(tti,--- ,7Tk) = ^ EiLi "j^TTi, and h : Vk ^ R by 



/l(7ri,- ■■ ,TTk) 

/i(7ri, • • • ,7rk 



1 

N 



,TTk)r. Clearly 
fc 



(15) 



Our strategy of proof will be the following. We want to 
upper bound Pr7r[|||/||^ — Ijo^lPl ^ ^llo^lP]- From Lemma|9]we 
know that E^[||/|p] is close to ||q:|P. This suggests that we 
investigate, for (3 > 0, the function G{(3) defined by G(/3) = 
Pr^[|||/||2-E^[||/||2]|] > = Pr^[|/^-E„[/^] > 

This last expression is exactly of the type for which the Self- 
Avoiding McDiarmid Inequality gives upper bounds, provided 
we can establish first that h satisfies the required conditions 
of the Self-Avoiding McDiarmid inequality. Deriving such a 
bound is thus our first step. 



- E 

j with 

E 

J with j^i 



-y 
-y 



~ '-Pm{Tr[,TTj)[X) 



a.j ae 



where we have used the same notation as in the proof of 
Lemma |9] i.e. Pm(i.j){x) '■= Lpi{x)Tp^{x). 



Because (tti, ... ,7rc, . . . , TTfc) and (tti, 
both in Vk, the indices tti, ... ,7r£, . . 
different. It then follows from (|5]l that 



. . , TT^, . . . , TTfe) are 
TTfc and 7r» are all 



|/i(7ri. 



< ^Ki 



E 



j with j^l 



, TTfc) — /l(7ri, ... , TT^, ... , TTfc) 

X 

aj|2iVi-''/2 

E 



(16) 



j with j^f 



where we have used that m{Tri,Trj) ^ 1 if tti ^ ttj , i.e. if 
£ ^ j. Because this bound is uniform over the tt^ in Xi, it is 
now clear that this implies the sufficient condition of the Self- 
Avoiding McDiarmid inequality, with q given by the right 
hand side of ( fTSI l. We can thus conclude from Theorem [TT] 
that 



Pr, [|/i-E[/i]| >/3|la|l 



(17) 




2/32 iVIIall^ 



16 Ej withj,^^!" 



E 



E 



< 



E 



j with j^e 

k 

E 



< \\af k\\af ^ k\\at , 
where we have used the Cauchy-Schwarz inequality in the 
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penultimate step, it follows that 



After substituting for h, and applying Lemma |9] we 

finally obtain that 

k-V 



|||/||^-!l«||1> /3 + 



C - 1 



< 2 exp 



(3^ N'' 



For e > (fc - 1)/(C - 1), we can set /3 = e - (fc - 1)/(C - 
1), thus recovering the StRIP-bound claimed in the statement 
of Theorem [8] for this case: with probability at least 1 

2 exp 



- e-(/c-l)(C-l)- 



8fe 



, we have the following near- 
isometry for fc-sparse vectors a: 

(l-e)llaf <l|/lp<(l + e)||a||^ 



(18) 



Remark 12. Equation ( fTSb implies that as long as 
^^<e, the probability of failure (i.e. the probability that the 
near-isometry inequality fails to hold) drops to zero as C — > oo. 
In particular, if j] equals 1, k < /i(C— l)e+l for some constant 
fi less than one, and iV = O then the probabiHty of 

failure approaches zero at the rate C^^. 

Remark 13. Figure [T] shows the distribution of condition 
numbers for the singular values of restrictions of the sensing 
matrix to sets of K columns. Two cases are considered; the 
Reed Muller matrices constructed in Section 2.2 and random 
Gaussian matrices of the same size. The figure suggests that 
the decay of 

Pr [Ill/IP -||a|n>e||a|n 
is similar for both types of compressive sensing matrices. 

Remark 14. Note that similar to the case of random and 
expander matrices, the number of measurements grows as 
the inverse square of the distortion parameter e, cx as 
e ^ 0. 

Remark 15. By avoiding the use of the Cauchy-Schwarz 
inequality at the end of the proof, and making use of Remark 
[TOl one can sharpen the bounds. From ( fTTI i it follows that with 

/3^JV"||a|| 



probabiUty at least 1 — 2 exp 



1 



C~l 



1 



C - 1 



i<\\.fr- 



and 



\\f\\'-\\ar<il3 + 



C - 1 



This implies (set /3 = 7p, p= \\a\\(-^\\a\\ ^) 



Pr. 



i/ii -ii«ir > IP 



C-l 



max(l, p — 1) ] II a 



or, equivalently. 



Py^[\\\fr-\\ar\>e\\ar] 



< 2 exp 
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-AT" 



^(p>V2) 



piC - 1) 



with p = llall^Jlall ^ , as above, and X(a>o) = 1 if a > 0, 
X(a>o) = otherwise. The worst case for this bound is when 
p = \fk, in which case we recover the bound in Theorem [8j 
if one is restricted, for whatever reason, to fc-sparse vectors 
that are known to have some entries that are much larger than 
other non vanishing entries, then the more complicated bound 
given here is tighter. 

Remark 16. If the sparsity level fc is greater than \/C, then 
C < fc^ < A^^. However, since some deterministic sensing 
matrices of section |ll] structurally require the condition A^ ^ < 



C, a deterministic matrix with N' = O 



fclogC 



rows and 



A^'^ < C columns is required. In this case, the N' xC sensing 
matrix $ is constructed by choosing C random columns from 
the A^' X C' deterministic matrix. 

B. Proving UStRIP: Uniqueness of Sparse Representation 

Although we have established the desired near-isometry 
bounds, we still have to address the Uniqueness guarantee; un- 
like the standard RIP case, this does not follow automatically 
from a StRJP bound, as pointed out in the Introduction. More 
precisely, we need to estimate the probability that a randomly 
picked fc-sparse vector a has an "evil twin" a' ^ a that maps 
to the same image under $, i.e. = and prove that 
this probability is very small. 

If S* C { 1, . . . , C } is the union of possible support sets 
of a two fc-sparse vectors, that is, if s = 15*1 < 2fc, then we 
define to be the N x s matrix obtained by picking out 
only the columns indexed by labels in S. In other words, the 
matrix elements of $5 are those ipj {x) for which j E S, with x 
varying over its full range. There will be two different fc-sparse 
vectors a' ^ a, the supports of which are both contained in 
S, if and only if the sx s matrix is rank-deficient (where 
denotes conjugate transpose of <!>). Note that this property 
concerns the support set S only - the values of the entries of 
a are not important. This is similar to the discussion of sparse 
reconstruction when $ satisfies a deterministic Null Space 
Property 1121 . Once uniqueness is found to be overwhelmingly 
likely, we can derive from it the probability that decoding al- 
gorithms (such as the quadratic decoding algorithms described 
in Section IV]) succeed in constructing, from ^a, a faithfully 
exact or close copy (depending on the application) of the fc- 
sparse source vector a. 

In fact, it turns out that we won't even have to consider 
matrices $5 with 15*1 = 2fc; as we shall see below, it suffices 
to consider for sets S of cardinality up to fc. 

Once again, condition (St3) will play a crucial role. For the 
StRIP analysis, in the previous subsection, it sufficed to to take 
77 > 0, where 77 is the parameter that measures the closeness 
of column sums in (St3). In this subsection, we will impose a 
non-zero lower bound on 77; we shall see that 77>0.5 suffices 
for our analysis. 
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We recall here the formulation of (St3): for any column Lpj 
of the sensing matrix, with j > 2, 



We introduced the notation $5 at the start of this subsection. 
We shall also use the special case where we wish to restrict 
the sensing matrix $ to a single column indexed by w; in 
that case, we denote the restriction by ifw Finally we denote 
the conjugate transpose of a matrix $5 by We shall 
use Tropp's argument (see Section 7 of [TroOSb]) to prove 
uniqueness of sparse representation; to apply this argument 
we first need to prove that a random submatrix (p„ has small 
coherence with the remaining columns of the sensing matrix. 

Lemma 17. Let $ be i]-StRIP-able with ry > 1/2, and 
assume that the conditions k < e{C — 1) + 1, and N = 
O ^(fciogC/g2'ji/''^ /loZc/, and S is as defined in Theorem^ 

[e-{k-l)/iC-l)fN'^ 



2 exp 



8A; 



Let w be a fixed column of $, and let k ~ • • • , k^} be 



the positions of the first k elements of a random permutation 
of { 1, . . . , C } \ {w\. Then 



E 



$1 



k C-N 
TV (C - 1) ' 



(19) 



where the expectation is with respect to the choice of the set 



Proof: By linearity of expectation we have 



1 1 



1 * 



(20) 

Since the set of columns of $ is invariant under complex 
conjugation, and forms a group under pointwise multiplication, 
we have 



X X 

where we use again the notation introduced just below (|9]i: 
ipe{x)(pi'{x) = 'Pm{i,i'){x)- As K ranges over all the possible 
permutations that do not move w, Ki ranges uniformly over 
{1, ... , C } \ {w}, and the different Zi := m{w, Ki) range 
uniformly over {2, . . . ,C}. 
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Hence: 



we have 



k 



k ^ 



k ^ 



i=l 
k 



■ N 

E 

.x,y=l 
C N 



m (C 
k 1 

7V2 (C - 1) 

k 1 



(c - 1) 
where we have used (Stl) 



^E E 'Pj^^)'Poiy) (21) 

j=2 .T,y=l 



iV (C - 1) ' 



Next, we use the Self-Avoiding McDiarmid inequality, 
together with property (St3) to derive a uniform bound for 
the random variable : 

Theorem 18. Let $ be rj-StRIP-able with 77 > 1/2, and 
assume that the conditions k < e{C — 1) + 1, and N — 

Q ^ ^ fc log c ^ ^ hold, define S as in Theorem\8\ and let X 

be a set of k random columns of $. Then with probability at 
least I ~ S, there exists no w such that 



1 



N 



1 



> 



k C~N ^2k\ogC/s 



N C-l 



(22) 



Proof: The proof is in several steps. In the first step, 
we pick any w G {1, ... ,C}, and keep it fixed (for the time 
being). Let 



k 



where we assume that ti , • • • , are k different elements of 
{ 1, ... , C } \ {w}, picked at random. Note that if A is a ran- 
dom permutation of { 1, . . . , C } \ {w}, then /(Ai, . . . , Afc) = 



1 $t 1 



The function /, as defined above from 



E[/(ti 

nfih 
1 



Ar2 
1 

7^ 



< 2N- 



J ti—lj ti, ij + l , 
1 



N 



N 



,tk)] 
,tk) 



x=l 



(23) 



by (St3), since m{w,ti) ^ \ ^ m{w,t'^). It immediately 
follows that the concentration condition holds for /, with 
Ci — A^^''. Therefore the Self- Avoiding McDiarmid Inequality 
holds for /, which means it also holds for F: for any positive 
7. 

2 " 



< cxp 



1 



N 



1 



> 



k 

N 



-1 



N 



k C- N 

> h 7 

- A^ C - 1 ' 



2k 



All this was for one fixed choice of w; note that the bound 
does not depend on the identity of w. This implies that by 
applying union bounds over the C possible choices for the 
column w of $, we get that the probability that there exists a 
w such that 

' k 

> h 7, 

- N 

. Writing 7 in terms of S completes 

■ 

the right hand side of (1221) 




log<5| 



ll/2 



logC 



(fclogC) 



{(ii,t2,...,tfc); e {1, ... ,C}\{w}yi, u^tj, yi^j} 

to K, is information-theoretically indistinguishable from the 
function F from the permutations of { 1, ... , C } \ {w} to R 
defined by 



Thus, if 77 > 1/2, then (for sufficiently small e, and sufficiently 
large C) a choice of k random columns of $ has a very 
high probability of having small coherence with any other 
column of the matrix; in particular, we have, with probability 
exceeding 1 — S, that 

2 



1 



1 



N 



(24) 



F{\) = 



1 



1 



'N VN' 

We have computed E[/] = E[F] in Lemma [iTl in order to 
apply the Self-Avoiding McDiarmid Inequality to /, we need 
verify only that a necessary condition of the Self-Avoiding 
McDiarmid inequality holds. 

When we subtract /(ti,-- - ,ti-i,t'i,ti+i, ■ ■ ■ ,tk) from 
/(ti,... , ti_i, ii, ti+i, . . . ,tk), only the i-th term survives; 



This establishes incoherence between the random submatrix 
$A and the remaining columns of the sensing matrix. 

We can now complete the UStRlP proof by following an 
argument of Tropp 1361 ; for completeness we include the 
argument here: 

Lemma 19. Let A ~ {Ai,... ,Xk} be a set of k indices 
sampled uniformly from {1, . . . , C}. Assume that $ is (fc, e, 5)- 
StRIP. Let S be any other subset of ■ ■ ,C} of size less 
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than or equal to k. Then, with probability at least (1 — 5) 
(with respect to the randomness in the choice of X) 



dim [range{<^\) n range{<^ s)) < k. 



(25) 



Proof: First, note that we need check only the case 
dim (range($5)) = k, since otherwise ( |25] l is immediate. 
Note also that, because $ is (fc, e, (5)-StRIP, the probability 
that the randomly picked set A = {Ai, • • • , Xk] satisfies 

(l-e)IdA < ^*1*A < (l + e)IdA 

is at least 1 — 5. (The notation Wa stands for the identity 
matrix on A; this just amounts to restating the (fc, e, (5)-StRIP 
condition in matrix form.) It follows that, with probability at 

least 1 — <5, 

cTmin (*a) > yJ{l-e)N, (26) 

where (imin ('I'a) is the smallest singular value of <I>a- 
Since S ^ X, S has at least one index not in A. Denote that 
index by s. Since the entries of the matrix are all unimodular, 
we have 

11^,11' -^|v^,(x)|' = TV. (27) 

X 

Let Pa be the orthogonal projection operator on the range TZ\ 

of $ A- We shall prove (|25ll by showing that II Pa IP < WVsW^, 
which implies that there exists a vector in the range of $5 that 
is outside the range of Note that 

-1 , 



= ^A ( *1$A 



(28) 



Since $a is (fc, e, (5)— StRIP, we have, still with probability at 
least 1 — 5, 



< 
< 



< 



(a„,i„($A))^ " N{l-e) 
(1 - e)N < N, 

where the penultimate inequality is by Equation (|24] |. ■ 

Theorem 20. Let be rj-StRIP-able with 77 > 1/2, and 
assume that the conditions k < e{C — 1) + 1, and N — 
Q ^ ^ fc log c ^ ^ hold, define 5 as in Theorem^ and let a 

be a randomly picked k-sparse signal. Then with probability 
at least 1 — 5 (with respect to the random choice of a), a is the 
only k-sparse vector that satisfies the equation / — -^$0;. 

Proof: We have already proved in Section 3.1.2 that <!> 
is {k, e, (5)-StRIP. We start by recalling that the random choice 
of a can be viewed as first choosing its support, a uniformly 
distributed subset of size k within {1, • • • ,C}, and then, once 
the support is fixed, choosing a random vector within the 
corresponding /c-dimensional vector space. For this last choice 
no distribution has been specified; we shall just assume that it 
is absolutely continuous with respect to the Lebesgue measure 
on M'' or C''. 



exceeding (1 — 5), so that 

dim (range(<I>A)) = k 

with probability exceeding 1 — 5. The near-isometry property 
of <1>A implies that no two signals with support A can have the 
same value in the measurement domain. If there nevertheless 
were a vector a' such that $a' = $«, the support S 
of a' would therefore necessarily be different from A. By 
Lemma [T9l we know that V = range(AA) n range(A5) is at 
most (fc — 1) -dimensional. It follows that in order to possibly 
have an "evil twin" a', the vector a must itself lie in the at 
most [k — l)-dimensional space that is the inverse image of V 
under $a- This set, however, has measure zero with respect to 
any measure that is absolutely continuous with respect to the 
fc-dimensional Lebesgue measure. Thus, for each k-set X for 
which (&A is a near-isometry, the vectors that are not uniquely 
determined by their image ^a, constitute a set of measure 
zero. Since randomly chosen fc-sets A produce restrictions ^\ 
that are near-isometric with probability exceeding 1 — 5, the 
theorem is proved. ■ 
Combining Remark [12] with Theorem |20] completes the 
proof of Theorem |8] 

IV. Partial Fourier Ensembles 

In Partial Fourier ensembles the matrix $ is formed by 
uniform random selection of N rows from the C x C dis- 
crete Fourier Transform matrix. The resulting random sensing 
matrices are widely used in compressed sensing, because the 
corresponding memory cost is only 0{N \ogC), in contrast to 
the 0{NC) cost of storing Gaussian and Bernoulli matrices. 
Moreover, it is known HO), Q that if iV > fc log^ C, then with 
overwhelming probability, the partial Fourier matrix satisfies 
the RIP property. It is easy to verify that such$ satisfies the 
Conditions (Stl), and (St2). We now show that it also satisfies 
Condition (St3) almost surely. 

Note that here in contrast to the proof of Theorem |8] the 
randomness is with respect to the choice of the N rows from 
the Discrete Fourier Transform matrix. We show that with 
overwhelming probability, the condition (St3) is satisfied for 
every column of this randomly sampled matrix. First fix a 
column ifi other than the identity column, and define the 
random variable to be the value of the entry (pi{x), where 
the randomness is with respect to the choice of the rows of 
$ (that is with respect to the choice of x). Since the rows are 
chosen uniformly at random, and the column sums (for all but 
the first column) in the discrete Fourier transform are zero, we 
have 



E 



N 



N 



0. 



(29) 



Since all entries are unimodular, we may apply Hoeffding's 
inequalityto both the real and the imaginary part of the random 



variable 
all e > 



N 



Pr 



then apply union bounds to conclude that for 



J2x z^ 



N 



> e 



< 4:Cxp{-2Ne^} . 



Since $ is (fc, e, (5)-StRIP, $a is non-singular with probability Applying union bounds to all C — 1 admissible columns we 
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get 



multiplication of a sparse superposition 



Pr [there exists a column average greater than e] (30) 
is at most 4Ccxp |— 2A^e^} . Hence, with probability at least 
1 — (5 all column averages are O I \/^^^ I, and all column 



1 



k 



sums are less than logC, so that condition (St3) is indeed 
satisfied. Applying Theorem [8] we see that a partial Fourier 
matrix satisfies StRIP with only k logC measurements. This 
improves upon the best previous upper bound of fclog^C 
obtained in fW\ and helps explain why partial Fourier matrices 
work well in practice. 



V. Quadratic Reconstruction Algorithm 



Algorithm 1 Quadratic Reconstruction Algorithm 

Input: N dimensional vector / = -^^a + v 
Output: An approximation a to the signal a 



Set /i = /, e = {}, d = 



for t = 1, • • • , fc or while ||/t||2 > e do 
for each entry a; = 1 to iV do 

pointwise multiply /t with a shift (offset) of itself as 
in (llB- 
5: end for 

6: Compute the fast Walsh-Hadamard transform of the 

pointwise product: Equation ( |32] | 
7: Find the position -pt of the next peak in the Hadamard 
domain: Equation ( [33] l implies that the chirp-like cross 
terms appear as a constant background signal, 
if vt e Keys(e) then 

Restore ft ^ ft + Q{pt)tfp^. 
end if 

Update Pt = -^f^ipp^ which minimizes \\ft ~ 

12: Add /3* to entry pt of a. 
13: Set eipt)=f3t. 

14: Set ft+i ft - f3tipp,. 

15: end for 



The Quadratic Reconstruction Algorithm 
described in detail above, takes advantage of the multivariable 
quadratic functions that appear as exponents in Delsarte- 
Goethals sensing matrices. It is this structure that enables the 
algorithm to avoid the matrix-vector multiplication required 
when Basis and Matching Pursuit algorithms are applied to 
random sensing matrices. Because our algorithm requires only 
vector-vector multiplication in the measurement domain, the 
reconstruction complexity is sublinear in the dimension of 
the data domain. The Delsarte-Goethals sensing matrix was 
introduced in Section 2.2: there are 2™ rows indexed by binary 
m-tuples X, and 2'''+^)™ columns ippi^bi indexed by pairs 
Pi , bi where Pi is a binary symmetric matrix and 6,j is a 
binary m-tuple. The first step in our algorithm is pointwise 



with a shifted copy of itself. The sensing matrix is obtained 
by exponentiating multivariable quadratic functions so the first 
step produces a sparse superposition of pure frequencies (in 
the example below, these are Walsh functions in the binary 
domain) against a background of chirp-like cross terms. 



f{x + a)f{x 



(31) 



TV 



Then the (fast) Hadamard transform concentrates the energy 
of the first term i J2'j=i lo^jPC"!)" at (no more than) k 
Walsh-Hadamard tones, while the second term distributes en- 
ergy uniformly across all N tones. The Z*'* Fourier coefficient 
is 



(32) 

and it can be shown (see IZSll ) that the energy of the chirp-like 
cross terms is distributed uniformly in the Walsh-Hadamard 
domain. That is for any coefficient I 

I 21 



lim 



N- 



r 



El 



(33) 



Equation ( |33] | is related to the variance of / and may be 
viewed as a fine-grained concentration estimate. In fact the 
proof of ( [33] l mirrors the proof of the UStRIP property given 
in Section 3; first we show that the expected value of any 
Walsh-Hadamard coefficient is zero, and then we use the Self- 
Avoiding McDiarmid Inequality to prove concentration about 
this expected value. The Walsh-Hadamard tones appear as 
spikes above a constant background signal and the quadratic 
algorithm learns the terms in the sparse superposition by vary- 
ing the offset a. These terms can be peeled off in decreasing 
order of signal strength or processed in a list. The quadratic 
algorithm is a repurposing of the chirp detection algorithm 
commonly used in navigation radars which is known to work 
extremely well in the presence of noise. Experimental results 
show close approach to the information theoretic lower bound 
on the required number of measurements. For example, numer- 
ical experiments show that the quadratic decoding algorithm 
is able to reconstruct greater than 40-sparse superpositions 
when applied to deterministic Kerdock sensing matrices with 
N ^ 2^ and C = 2^^. In this case, the information theoretic 
lower bound is fclog2(l +C/k) = 507 



We now explain how the StRIP property provides perfor- 
mance guarantees for the Quadratic Reconstruction Algorithm. 
At each iteration the algorithm returns the location of one 
of the k significant entries and an estimate for the value of 
that entry. The StRIP property guarantees that the estimate 
is within e of the true value. These errors compound as the 
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Fig. 2. The number of successful reconstructions in 1000 trials versus the 
sparsity factor k for the deterministic Kerdock sensing matrix corresponding 
to m = 9 



algorithm iterates, but since the chirp cross-terms and noise 
are uniformly distributed in the Walsh-Hadamard domain, the 
error in recovery is bounded by the difference between the true 
signal a and its best fc-term approximation a^. More precisely, 
if $ is (fc, e, (5)-StRIP, if the position of the k significant entries 
are chosen uniformly at random, if the near-zero entries and 
the measurement noise i/ come from a Gaussian distribution, 
and if the Quadratic Recovery Algorithm is used to recover 
an approximation d for a, then 

5 + e 2 

l|a - d||2 < ^ "'^'l^ + Y^r^'I'^'l^- (34) 

The role of the StRIP property is to bound the error of approx- 
imation in Step 1 1 of the Quadratic Reconstruction Algorithm. 
Note that if it were somehow possible to identify the support 
of a beforehand, then the UStRIP property would guarantee 
that we would be able to recover the signal values by linear 
regression. However identifying the support of a fc-sparse 
signal is known to be almost as hard as full reconstruction, and 
that is why our algorithm finds location and estimates signal 
value simultaneously, and does so one location at a time. 
Note that the error bound is of the form £2/^2^ 



||a - d||2 < C\\a - ak\[ 



(35) 



This bound is tighter than £2 / ii bounds of random ensembles 
||2l , and £i/£i of expander-based methods ||6l. 

VI. Resilience to Noise 

A. Noisy Measurements 

In this Section, we consider deterministic sensing matrices 
satisfying the hypothesis of Theorem [8] and show resilience 
to independent identically distributed (iid) Gaussian noise 
that is uncorrelated with the measured signal. Note we have 
introduced the square of (l±e') in (|36T l merely to simplify the 
notation in the proof. (This e' could be, for instance, picked 
so that e'(2 — e') > e, where e has the same meaning as in 
Theorem [8]) 



Theorem 21. Let $ and a be such that 

(l-6')^|Hp<||^cf>a|p<(l + e')^|Hp, (36) 



with probability exceeding 5 > Q, and let f — + 
v, where the noise samples i^ix) are iid complex Gaussian 
random variables with zero mean and variance 2ct2. Then, 
for 7 > 0, 



(l-6'-7f|HP<||/|P<(l + 6' + 7f||«| 



with probability greater than 1 — 2 [3 -\- S 



7 II a II 



S{r) = 



, (37) 
where 

-1 



Proof: First consider the probability that 1 1 /| | exceeds the 
upper bound in (l37T i. Setting g = -^^a, we have 

Pr[||/||>(l + 6' + 7)||a||] 
<Pr[||5|| + 11^^11 > (l + e' + 7)||a|l] 
<Pr[||.9|| > (l + e')ll"ll]+Pi-[l|i^ll >7l|a||] 



< 6 

= 5 



(27rCT2)W/2 

1 

(27r)^/2 



y II >7 II Q II 



2ii2/ir ) d^y 



e-||«llV2dW, 



\u\\ >7 II / o" 

The estimate for Pr [||/|| < (1 — e' — 7)||q||] is similar, and 
the desired bound then follows from the union bounds. ■ 



B. Noisy Signals 

If the signal a is contaminated by white gaussian noise then 
the measurements are given by 



(38) 



where /i is complex multivariate Gaussian distributed, with 
zero mean and covariance 



(39) 



The reconstruction algorithm thus needs to recover the signal 
from the noisy measurements 



(40) 



where — -^^^ is complex multivariate Gaussian dis- 
tributed with mean zero and covariance 



N 



(41) 



The deterministic compressive sensing schemes considered in 
this paper have some advantage over random compressive 
sensing schemes in that ^-^<1>^^ = jj'Inxn and con- 



sequently E{i'{x)h'{x')) = ^^f^Sx,x', i-C-, the noise samples 
on distinct measurements are independent. One can thus use 
the estimates of the previous subsection again. Noise of this 
type is of course harder to deal with; this is illustrated here 
by the measurement variance being a (possibly huge) factor 
C/N larger than the source noise variance a^. 
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VII. Conclusions 

We have provided simple criteria, that when satisfied by 
a deterministic sensing matrix, guarantee successful recovery 
of all but an exponentially small fraction of k-sparse signals. 
These criteria are satisfied by many families of deterministic 
sensing matrices including those formed from subcodes of the 
second order binary Reed Muller codes. The criteria also apply 
to random Fourier ensembles, where they improve known 
bounds on the number of measurements required for sparse 
reconstruction. Our proof of unique reconstruction uses a 
version of the classical McDiarmid Inequality that may be 
of independent interest. 

We have described a reconstruction algorithm for Reed 
Muller sensing matrices that takes special advantage of the 
code structure. Our algorithm requires only vector-vector 
multiplication in the measurement domain, and as a result, 
reconstruction complexity is only quadratic in the number of 
measurements. This improves upon standard reconstruction 
algorithms such as Basis and Matching Pursuit that require 
matrix-vector multiplication and have complexity that is su- 
perlinear in the dimension of the data domain. 
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Appendix A 
Properties of Delsarte-Goethals Sensing 
Matrices 

First we prove that the columns of the r*'* Delsarte-Goethals 
sensing matrix form a group under pointwise multiplication. 

Proposition A.l. Let Q — Gim^r) be the set of column 
vectors (fp.b where 

VpA^) = , for xe 

where b G F™ and where the binary symmetric matrix P varies 
over the Delsarte-Goethals set DG{m, r). Then Q is a group 
of order 2(^+2)™ under pointwise multiplication. 

Proof: We have 

VP,b{x)ipp'^b'{x) 

^ ^■wt(dp)+wt(dp,)+2wt{b®b')^x{P+P')xT+2{b®b')x'^ 

where © is used to emphasize addition in F™. Write P + P' ^ 
{P(BP') + 2Q {mod 4) where Q is a binary symmetric matrix. 
Observe that 2xQx^ = 2dQX^ [mod 4), where the diagonal 
dq = dp * dp' is a pointwise product of dp and dp/. 
Thus 

ippfi{x)(pp>^b' (x) 
_ ^(^[■wt(dp)+wt(dp,)+2wt{dp*dpi)]+2wt{b(Bh'<Bdp*dp,)) 

^x{P+P')x'^ +2{b®b'®dp*dpi)x'^ 
= '^P®P' ,b®b'®dp*dp,{x), 

and Q is closed under pointwise multiplication. Hence the 
possible inner products of columns </3p,d, fp',d' are exactly the 
possible column sums for columns (fiq^b where Q = P®P' . ■ 
Next we verify property (St3). 

Proposition A.2. Let Q be a binary symmetric mx m matrix 
with rank r and let b G F™. If 

g _ ^ ■xQx^+2bx~^ 

X 

then either S — or 

52 ^ ■z^QzJ+2bzJ■2^^-r ^ ZiQ = dQ. 

Proof: We have 

5-2 ^ ■xQx'^ +yQy'^ +2b{x+y)'^ 

= ^{x+y)Q{x+y)'^ +2*Qy'^ +2b(x+y)'^ 

x,y 

Changing variables Xo z = x ® y and y gives 

2 y 

Since the diagonal of a binary symmetric matrix Q is 
contained in the row space of Q there exists a solution zQ = 
dq. The solutions to the equation zQ = form a vector space 
E of dimension rn — r, and for all e, / e 

eQeT + /Q/T = (e + /)Q(e + /)T (mod 4). 



Hence 

= 2™ ^ ^(zi+e)Q(zi+e)^+2(zi+e)b'^ 
eS-E 

The map e eQe^ is a linear map from E to Z2, so the 
numerator eQe^ + 2eb^ also determines a linear map from E 
to Z2 (here we identify Z2 and 2Z4). If this linear map is the 
zero map then 

^2 i22m—rj^ziQzJ-\-2bz^ 

and if it is not zero then S* = 0. Note that given e eQe^ , 
there are 2" ways to choose b so that e eQe^ + 2eb^ is 
the zero map. ■ 
The 0*'' Delsarte-Goethals sensing matrix is a matrix with 
iV = 2™ rows and N'^ columns. These columns are the 
union of N mutually unbiased bases, where vectors in one 
orthogonal basis look like noise to all other orthogonal bases. 

Appendix B 
Generalized McDiarmid's Inequality 

The method of "independent bounded differences" ( f351 ) 
gives large-deviation concentration bounds for multivariate 
functions in terms of the maximum effect on the function 
value of changing just one coordinate. This method has been 
widely used in combinatorial applications, and in learning 
theory. In this appendix, we prove that a modification of 
McDiarmid's inequality also holds for distinct (in contrast to 
independent ) random variables; our proof consists again in 
forming martingale sequences. 

We first introduce some notation. Let Xi, - ■ ■ , Xm be prob- 
ability spaces and define X as the probability space of all 
distinct m-tuples, that is. 



Theorem B.l (Self-avoiding McDiarmid inequality). Let X 

be the probability space defined in Equation \42\ . and let f : 
X ^ R be a function such that for any index i, and any 

xi^{i-i) e (i-i), 

sup E[f{xi,---,X.i-i,U,X^+i,---,Xm)] (43) 

u^Xi ;u^Xn ,n— 1 — >i 

- inf E[f{xi,---,Xi-iJ,Xi+i,---,Xjn)<Ci. 

l^Xi;l^Xn,n—l—^i 



X ^ {{xi, ■ ■ ■ , Xm) € n-" lA'i such that V j : Xi ^ xj}. 

(42) 

(This definition is spelled out in more detail at the end of 
subsection 3.1.2.) Let f{xi, ■ ■ ■ ,Xm) be a function from X 
to M, and let f{Xi, . . . , Xm) be the corresponding random 
variable on X. Denote by Xi^i the i- tuple of random 
variables {Xi,--- ,Xi) on the probability space X. (The 
"complete" m-tuple {Xi, . . . , Xm) will also be denoted by 
just X.) Analogously, define to be the (m — i)- 

tuple of random variables - ,Xm). We shall 

also use the notations xi^i = (xi,...,Xi) £ H^^iXg, 
Xi-,1 = {xi^i e njj^i; a;<? ^ a;„ if £ 7^ n}; 

e w;i^,^^^^Xi and c n;^L(^+i) 

are defined analogously. 
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Then for any positive e, 
Pr[|/(Xi,--- ,X^)-E[/(Xi,--- >e] <2exp 



E. 



Xaa) 



Proof: Using Markov's inequality, we see that for any 
positive t 



Pr[/-E[/]>e] = Pr 



Our proof will invoke Hoeffding's Lemma [ 113511 1 



(51) 



Proposition B.2 (Hoeffding's Lemma). Let X be a random 
variable with E[X] = and a < X < b then for t > 



Since / — E[f] = Zm — Zq, we can rewrite this as 



E [e*^] < cxp 



8 



E 



=t(/-E[/]) 



E 



exp 



In our proof we will also make use of the functions 

Zi{xi^i) = E[f{X)\Xi^i ^ xi^,] where xi^^ G Xi- 
As a result, for all Xi^u^i) in Xi^u_i\ 



sup Zj(.Ti^(j_i),u) - inf Zi(a;i_(i_i),Z) 

is less than q. This implies, for all xi^i E Xi^i, 



By marginalization of the expectation, 

E 



exp \t^^{Z^- Zi^i) 



(45) 











Ex„ 


exp 1^ 



exp \Y.{Z^- Ex„ [e*(^"-^"-^) |Xi^(,„_i 



,0 



< inf ZAxi^i^^i-s 

— sup 

uf^XijU^Xji ,n— 1 — — 1 

< Z^xi^i) (46) 

- E[/(Xi^(i_i), Xi,X(j_|.i)_„)|a::i^(j_i)] 
= Zi{xi^i) - Zi^i{xi^(i^r)) 

< (47) 

- inf Zi{xi^u_i)J) 

< sup (48) 

yu^Xn ,n— 1 — ^2 — 1 

— inf Zi(x-,^/j_-,\,l) 

< Ci , 



where we have used that each Zi depends on only the first i 
components of X, so that only {Zm-i — Zm) is affected by 
the averaging over Xm- 

By ( |49] l, we have, for all xi^i S Xi^i, 
\Zi{xi^i) — < Ci, which can also be 

rewritten as -q < - Z^^i{X) < c,. 

Because of the martingale property ( |50l l we have 

E[Z,-Z,_i|Xr'] = Ex(.+,,_ [Z.-Z.-ilXj-i] = 
Ex, [Z, -Z,„i|Xi-i] =0. 

Combining these last two observations with Hoeffding's 
Lemma [1351 we conclude 



or 



Exi 

= E 



\Zt{xi-,i) - Zi_i(a;i_(j_i))| < Ci 



(49) 



exp 

\i=i 



Until now, we have viewed each Zi as a function on the 
subset Xi-fi of Hg^^Xf, it is straightforward to lift the Zi 
to functions on all of X. The Zi{Xi^i) = Zi{X) can also be 
considered as random variables on X, depending only on the 
first i components of X, 

(The subscript on the expectation indicates that 

one averages only with respect to the variables listed in the 
subscript, in this case the last m — i variables. We adopt 
this subscript convention in what follows; only expectations 
without subscript are with respect to the whole probability 
space X.) 

Viewing the Zi as random variables, we observe that Zq = 
E[f{Xi,- ■ ■ , Xm)], and that Z„, = f{Xi, • • • , Xm)- Because 
of the restriction to X, the random variables Xg, Zi are not 
independent. However, with respect to averaging over Xi, the 
Zi,i — 1, . . . , m constitue a martingale in the following 
sense: 

ExAZ^{X)\Xl^^,^l)] = , (50) 



< 



< 



cxp '^{Zt- Zi^i) 

\i=l 



Substituting this into dsTl ) we obtain 

Pr [/ - Elf] > e] < exp 1^ e + i t2 |] c2 ^ (52) 

Since equation ( |52] i is valid for any t > 0, we can optimize 
over t. By substituting the value ^ = 4e ( ^ cf we get 

-2e2^ 



Pr [./ - E[/] > e] < exp 



Replacing the function / by E[/] — /, it follows that 
Pr[/ — E[/] < — e] < exp ^ ^ '■• union bounds therefore 
imply that 

Pr[|/-E[/]|>e]<2exp(^^ 



